0.1 Abstract

Terrorism is the unlawful use of force or violence against persons or property to intimidate or coerce a government, the civilian population, or any segment thereof, in furtherance of political or social objectives. It targets ethnic or religious groups, governments and political parties, corporations, and media enterprises. Terrorism that occurs throughout the world is known as global terrorism. It is probably the worst type of crime that ever exists. Not only does it kill people, it destroys livelihoods, economies, and civilized world order that took millennia to form. The results of terrorism are almost always catastrophic. Individuals or groups that commit these crimes are called terrorists. Terrorists exist all over the world. There are a few that operate alone, but mostly they are parts of one of many global organizations.

By this project, I want to draw various inferences about the worst hit countries by terrorism over the past years. And which terrorist organisation have caused more damage over the years

0.2 About the dataset

The Global Terrorism Database (GTD) is the most comprehensive unclassified database of terrorist attacks in the world. The National Consortium for the Study of Terrorism and Responses to Terrorism (START) makes the GTD available via this site in an effort to improve understanding of terrorist violence, so that it can be more readily studied and defeated. The GTD is produced by a dedicated team of researchers and technical staff.

The GTD is an open-source database, which provides information on domestic and international terrorist attacks around the world since 1970, and now includes more than 200,000 events. For each event, a wide range of information is available, including the date and location of the incident, the weapons used, nature of the target, the number of casualties, and – when identifiable – the group or individual responsible. Link of the dataset: https://www.start.umd.edu/gtd/access/

0.3 Timeline

Review 1: Select the dataset, Dataset cleaning and preprocessing, perform Data Visualisation using different types of graphs.
Review 2: Use the latitude and longitude variables and perform data visualisation using maps in R studio.
Review 3: Use Tableau for data visualisation and perform required documentation.

1 Data Preparation

1.1 Load libraries

library(tidyverse)
library(data.table)
library(lubridate)
library(RColorBrewer)
library(gridExtra)
library(plotly)
library(ggthemes)
library(wesanderson)
library(leaflet)
library(VIM)

1.2 Load data

dt <- as.tibble(fread("globalterrorismdb_0718dist.csv",
                      na.strings = c("", "NA")))

There are 135 variables in the original data. We’ll select variables that are relatively easy to interpret and have less missing values: year, month, location, number of kill, ransom, suicide…

gbtr <- select(dt, c(1,2,3,4,9,11,12,13,14,15,18,27,28,59,99,113,117))
gbtr$imonth[gbtr$imonth==0] <- NA
gbtr$iday[gbtr$iday==0] <- NA

gbtr2k <- gbtr %>% filter(iyear>=2000)
gbtr2k$imonth[gbtr2k$imonth==0] <- NA
gbtr2k$iday[gbtr2k$iday==0] <- NA

glimpse(gbtr)
## Rows: 181,691
## Columns: 17
## $ eventid     <int64> 9.733093e-313, 9.733093e-313, 9.733143e-313, 9.733143...
## $ iyear       <int> 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1...
## $ imonth      <int> 7, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ iday        <int> 2, NA, NA, NA, NA, 1, 2, 2, 2, 3, 1, 6, 8, 9, 9, 10, 11...
## $ country_txt <chr> "Dominican Republic", "Mexico", "Philippines", "Greece"...
## $ region_txt  <chr> "Central America & Caribbean", "North America", "Southe...
## $ provstate   <chr> NA, "Federal", "Tarlac", "Attica", "Fukouka", "Illinois...
## $ city        <chr> "Santo Domingo", "Mexico city", "Unknown", "Athens", "F...
## $ latitude    <dbl> 18.45679, 19.37189, 15.47860, 37.99749, 33.58041, 37.00...
## $ longitude   <dbl> -69.95116, -99.08662, 120.59974, 23.76273, 130.39636, -...
## $ location    <chr> NA, NA, NA, NA, NA, NA, NA, "Edes Substation", NA, NA, ...
## $ success     <int> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1...
## $ suicide     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ gname       <chr> "MANO-D", "23rd of September Communist League", "Unknow...
## $ nkill       <int> 1, 0, 1, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 1, 0...
## $ nhours      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ ransom      <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...

1.3 some sample values

head(gbtr)
## # A tibble: 6 x 17
##   eventid iyear imonth  iday country_txt region_txt provstate city  latitude
##   <int64> <int>  <int> <int> <chr>       <chr>      <chr>     <chr>    <dbl>
## 1 9.7330~  1970      7     2 Dominican ~ Central A~ <NA>      Sant~     18.5
## 2 9.7330~  1970     NA    NA Mexico      North Ame~ Federal   Mexi~     19.4
## 3 9.7331~  1970      1    NA Philippines Southeast~ Tarlac    Unkn~     15.5
## 4 9.7331~  1970      1    NA Greece      Western E~ Attica    Athe~     38.0
## 5 9.7331~  1970      1    NA Japan       East Asia  Fukouka   Fuko~     33.6
## 6 9.7331~  1970      1     1 United Sta~ North Ame~ Illinois  Cairo     37.0
## # ... with 8 more variables: longitude <dbl>, location <chr>, success <int>,
## #   suicide <int>, gname <chr>, nkill <int>, nhours <dbl>, ransom <int>

1.4 Visualization of missing value

matrixplot(gbtr, sortby = c("nkill"))

aggr(gbtr, labels=names(gbtr),cex.axis = .9)

Variables such as location, nhours, and ransom has large number of missing values. EDA with thses variables will be avoided.

2 Analysis

2.1 Events by year

p <- gbtr %>% mutate(iyear=as.factor(iyear))  %>%
  group_by(iyear) %>% count() %>% 
  ggplot(aes(x=iyear,y=n,group=1)) +
  geom_line(size=1, color="brown")+
  geom_point(color="brown") +
  scale_x_discrete(
    breaks=c("1970", "2000","2008", "2011", "2014","2017")
    ) +
  labs(title = "Event by year", x = "year", y = "count")+
  theme_economist() 
p

There is a rapid increase in terrorist event since year 2000. We’ll seperately observe the trend by the region.

2.2 Overall trend in each region

p4 <- gbtr %>% count(region_txt, iyear) %>% 
  ggplot(aes(iyear, n,color=region_txt)) +
  geom_line(aes(group=region_txt)) +
  labs(title = "Trend by Region", x="year", y="count", color="region")+
  theme_light()
ggplotly(p4)

Hovering over the plot to see region label Middle East & North Africa and South Asia are the regions mainly responsible for the spike in data.

2.3 Events & num. of kills by region

Since there is a steep upward trend since aproximately year 2000, we’ll inspect the period before and after 2000 seperately.

p2 <- gbtr %>% mutate(pd=ifelse(iyear<2000,"before 2000", "after 2000")) %>%
  mutate(pd = factor(pd, levels = c("before 2000", "after 2000")))%>% 
  group_by(region_txt, pd) %>% count() %>%
  ggplot(aes(x=reorder(region_txt, n), y=n))+
  geom_bar(aes( fill=pd), stat= "identity", position = "dodge")+
  labs(title = "Events by region", x = "region", y = "count", fill = "period")+
  theme_economist()+
  scale_fill_manual(values = c("#66b2b2","#006666")) +
  coord_flip() 

p2

  • The region with the most terrorist attack bacame “Middle East & North Africa” after 2000. (“South America” before 2000).

  • “South Asia” saw the largest increase in terrorism since the 70s.

2.4 Number of deaths and number of events

pkr <- gbtr2k %>% filter(!is.na(nkill)) %>%  group_by(region_txt) %>% 
  summarise(ksum=sum(nkill)) %>% 
  ggplot(aes(reorder(region_txt,ksum), ksum))+
  geom_bar(stat = "identity", fill="#2E8B57")+
  coord_flip()+
  labs(title = "Num. of kills by region", subtitle = "without missing values, after 2000", x="region", y="count")+
  theme_economist()
## `summarise()` ungrouping output (override with `.groups` argument)
per <-   gbtr2k %>% group_by(region_txt) %>% count() %>% top_n(10,n) %>% 
  ggplot(aes(x=reorder(region_txt, n), y=n))+
  geom_bar(stat= "identity", fill="#006666")+
  labs(title = "Events by region",subtitle = "after 2000", x = "region", y = "count")+
  theme_economist()+
  coord_flip()
  
  
grid.arrange(pkr,per,ncol=2)

  • South Asia has the largest num. of kills (other than “Sub-Saharan Africa”, “Middle East & North Africa” ) despite the missing values.
  • North America has higher number of kills than Western Europe and South America, even though there is less attacks.

2.5 Events by country

We’ll look at data after year 2000

pec <- gbtr2k %>% group_by(country_txt) %>% count() %>% ungroup() %>%
  top_n(n=20,wt = n) %>% 
  ggplot(aes(reorder(country_txt, n), n))+
  geom_bar(stat = "identity", fill="#21618C") +
  labs(title = "Event by country", subtitle = "after 2000", x = "Country", y = "Count") +
  theme_economist() +
  scale_fill_manual(values = wes_palette(n=4,"Cavalcanti1"))+
  coord_flip()
pec

2.6 Suicide attack

dtscd <- gbtr2k %>% filter(!is.na(suicide)) %>%  group_by(region_txt, suicide) %>% count()  %>%
  ungroup() %>% group_by(region_txt) %>% mutate(pct=n/sum(n)) %>% filter(suicide==1)

ggplot(dtscd, aes(reorder(region_txt, pct), pct*100)) +
  geom_bar(stat = "identity", fill="#5D6D7E")+
  coord_flip()+
  labs(title = "Pct of suicide attack by region", subtitle = "after 2000", x="region",y="%")+
  theme_economist()

2.7 Groups, attacks, and suicide

gbtr %>%filter(gname!="Unknown") %>%  group_by(gname,suicide) %>% summarise(n=n()) %>%
  ungroup() %>% group_by(gname) %>% mutate(sum=sum(n)) %>% ungroup() %>%  top_n(30,sum) %>% 
  ggplot(aes(x=reorder(gname,sum),n, fill=factor(suicide, levels = c(1,0)))) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Groups and attacks", x="groups", y="attacks", fill="suicide") +
  theme_economist_white() +
  scale_fill_manual(values = wes_palette(n=2, "Cavalcanti1"))
## `summarise()` regrouping output by 'gname' (override with `.groups` argument)

Disregarding the “Unknown” groups

  • Taliban, ISIL, SL is reponsible for the most attacks.
  • ISIL carried out the most suicide attacks.(23%)

2.8 Attack type by region

wp <- dt %>% select(1,2,3,4,9,11,13,14,15,27,28,30,59,83,85,99,102,117)
wp$imonth[wp$imonth==0] <- NA
wp$iday[wp$iday==0] <- NA
patkrg<- wp %>% group_by(region_txt, attacktype1_txt) %>% count() %>% 
  ggplot(aes(region_txt, n, fill=attacktype1_txt)) +
  geom_bar(stat = "identity",position = "stack")+
  scale_fill_manual(values = wes_palette("Darjeeling1" ,n=9, type="continuous"))+
  theme_economist()+
  scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 0.8))+
  labs(title = "Attack type by region",
       x="region", y="num.", fill="attack type")

patkrg2<- wp %>% group_by(region_txt, attacktype1_txt) %>% count() %>% 
  ggplot(aes(region_txt, n, fill=attacktype1_txt)) +
  geom_bar(stat = "identity",position = "fill")+
  scale_fill_manual(values = wes_palette("Darjeeling1" ,n=9, type="continuous"))+
  theme_economist()+
  scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 0.8))+
  labs(title = "Attack type by region",
       x="region", y="num.", fill="attack type")

patkrg

patkrg2

  • Western Europe has more “Facility/Infrastructure Attack” (in number) than any other region.
  • Bombing/Explosion is the most common attack type in Middle East & North Africa.

2.9 Attack type by group

Different groups might prefer different types of attack method. There are 3537 groups in the data. We’ll look at the groups with the most attacks.

wp %>% filter(gname %in% grp$gname)%>% 
  group_by(gname, attacktype1_txt) %>% count() %>% 
  ggplot(aes(gname, n, fill= attacktype1_txt))+
  geom_bar(stat = "identity",position = "stack")+
  scale_fill_manual(values = wes_palette("Darjeeling1",n=9, type="continuous"))+
  theme_economist()+
  scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 5))+
  labs(title = "Attack type by groups", subtitle = "Groups with the most attacks",
       x="groups", y="pct", fill="attack type")

  • Armed assault is common in most groups except IRA which prefers assassination next to bombing.
  • Bombing is the most used attack type by ISIL.
  • 34% of ISIL’s bombing attack is suicide attack.
wp %>% filter(attacktype1_txt=="Bombing/Explosion" & gname %in% grp$gname ) %>% 
  group_by(gname, suicide) %>% count() %>% ungroup() %>% group_by(gname) %>% mutate(pct=n/sum(n)) %>% filter(suicide==1) %>% arrange(desc(pct))
## # A tibble: 6 x 4
## # Groups:   gname [6]
##   gname                                         suicide     n     pct
##   <chr>                                           <int> <int>   <dbl>
## 1 Boko Haram                                          1   435 0.520  
## 2 Islamic State of Iraq and the Levant (ISIL)         1  1258 0.342  
## 3 Taliban                                             1   646 0.225  
## 4 Al-Shabaab                                          1   149 0.122  
## 5 Kurdistan Workers' Party (PKK)                      1    26 0.0355 
## 6 Revolutionary Armed Forces of Colombia (FARC)       1     2 0.00216

2.10 Number of death by attack type and region

wp %>%  filter(!is.na(nkill)&attacktype1_txt!="Unknown") %>%
  group_by(region_txt,attacktype1_txt) %>% 
  summarise(sumk=sum(nkill), event=n(), kperattack=sum(nkill)/n()) %>% 
    ggplot(aes(reorder(attacktype1_txt, kperattack), kperattack))+
  geom_bar(aes(fill=region_txt), stat = "identity")+
  coord_flip()+
  facet_wrap(.~ region_txt, ncol = 4, scales = "free_x")+
  labs(title = "num. of death by attack type and region", x="attack type", y="death per event")+
  scale_fill_manual(values = wes_palette("Darjeeling1", n=12, type = "continuous"))+
  theme(legend.position = "none")
## `summarise()` regrouping output by 'region_txt' (override with `.groups` argument)

  • Types of attack that cause the most death/attack is drastically different from region to region.

  • Bombing (to my surprise) isn’t responsible for the most death/attack. Instead it’s armed assault and hostage taking in most region.

  • Hostage taking has the most death/attack in East Asia, Eastern Europe, Middle East & North Africa, South Asia, Southeast Asia, Sub-Saharan Africa and Western Europe.

  • North America’s extreme data reflects 9/11 attacks on 2001, with nearly 3,000 recorded deaths in 4 attacks.